Claude/setup pip install k2 m4j by igerber · Pull Request #4 · igerber/diff-diff

igerber · 2026-01-02T12:51:04Z

No description provided.

Update license field from deprecated table format to simple string format to comply with modern setuptools standards and eliminate deprecation warnings.

Changed from SPDX string format back to {text = "MIT"} format for compatibility with current PyPI infrastructure which does not yet support the License-Expression metadata field.

Review fixes: - Add edge case validation in _compute_flci (se > 0, 0 < alpha < 1) - Improve significance_stars docstring explaining partial identification - Standardize error messages to include parameter values (M, Mbar, alpha) - Make LP solver method configurable in _solve_bounds_lp - Add clarifying comment about constraint matrix design for pre+post periods - Improve CallawaySantAnna error message with actionable guidance Notes: - #4 (sensitivity_plot export) was verified as valid - function exists at honest_did.py:1437 - #1 (pre-period effects) verified correct - LP optimization covers all periods but only post-periods contribute to objective function

Revised review reflects: - #1, #4 verified as non-issues (correct by design) - #3, #5, #6, #8, #13 addressed in commit e40d6b4 - Updated recommendation to approve and merge - Remaining items are low-priority style suggestions for future PRs

Phase 2 silent-failures audit — axis-G (backend parity). Closes the coverage gap the audit flagged in three Rust-backed solver surfaces. Test-only PR; any discovered divergences are marked `xfail(strict=True)` and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope. Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373, 597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity` covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank (cond > 1e10), and rank-deficient collinear designs under `skip_rank_check=True` on HC1. Backends agree on fitted values within `rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed. Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`): three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular `Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg variations are `xfail(strict=True)` with a baselined ~15-80% weight divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is projected gradient descent (`utils.py:1228`) — same QP, different simplex vertices under near-degenerate inputs. Finding #23 — TROP Rust grid-search + bootstrap parity (`trop_global.py:688-750, 966-1006`): two parity tests in `TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both `xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6% divergence), bootstrap SE under `seed=42` (~28% divergence, RNG backend mismatch — Rust `rand` crate vs numpy `default_rng`). Plan governance: - Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust entry points (`_solve_ols_rust`, `_rust_synthetic_weights`, `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`); no additional silent-fallback surfaces identified. - Per plan Non-goal #4, did not open an axis-H finding on TROP's `seed=None → 0` substitution at `trop_global.py:994` (out of scope). - No behavioral changes, no warnings, no REGISTRY changes, no flags. TODO.md logs three P1 follow-up entries: algorithmic unification for `compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on rank-deficient Y, TROP bootstrap RNG unification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes BR/DR foundation gap #4 (real-dataset validation) from the external-positioning gap list in ``project_br_dr_foundation.md``. Validation artifact: - ``docs/validation/validate_br_dr_canonical.py`` runs BusinessReport / DiagnosticReport on Card-Krueger (1994), mpdta (Callaway-Sant'Anna 2021 benchmark), and Castle Doctrine (Cheng-Hoekstra 2013 under both CS and SA), dumping summary + full_report + selected to_dict blocks for each. - ``docs/validation/br_dr_canonical_validation.md`` is the regenerable raw output. - ``docs/validation/br_dr_canonical_findings.md`` is the hand-written synthesis: direction / verdict / sensitivity tier all match canonical interpretations, with two small wording bugs surfaced and fixed in this PR and two larger gaps queued as follow-up (SA HonestDiD applicability, target-parameter disambiguation). Wording fixes: 1. Treatment-label capitalization. ``str.capitalize()`` lowercased every character after the first, flattening embedded abbreviations (``"the NJ minimum-wage increase"`` → ``"The nj minimum-wage increase"``) and proper-noun phrases (``"Castle Doctrine law adoption"`` → ``"Castle doctrine law adoption"``). Replaced with a ``_sentence_first_upper`` helper that preserves user-supplied casing. 2. ``breakdown_M == 0`` phrasing. The HonestDiD fragile sentence quoted ``{breakdown_M:.2g}x the pre-period variation``, which renders as a degenerate ``0x`` on the exact-zero case surfaced by Cheng-Hoekstra. At ``breakdown_M <= 0.05`` (covers 0 and near-zero values), both BR's summary and DR's overall_interpretation now say "includes zero even at the smallest parallel-trends violations on the sensitivity grid" instead. Tests: 5 new regressions in ``TestCanonicalValidationSurfaceFixes`` covering both fixes + three boundary cases (exact zero, small positive, normal fragile value). Not in scope: Favara-Imbs (dCDH reversible-treatment dataset not bundled), ImputationDiD / TwoStageDiD on canonical data (needed to exercise the R42 untreated-outcome FE assumption branch on real data), SA HonestDiD applicability gap. All tracked in the findings doc for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Close BR/DR gap #4: canonical-dataset regression guards + wording fixes

…nuousDiD prerequisite list as profile-side screening + add first_treat caveat P1 (the five profile-derived facts are not the "full" gate set): Reviewer correctly noted that calling `{has_never_treated, treatment_varies_within_unit==False, is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the "full ContinuousDiD pre-fit gate set" overreaches. `profile_panel` only sees the four columns it accepts and CANNOT see the separate `first_treat` column that `ContinuousDiD.fit()` consumes. Verified against `continuous_did.py:230-360`: `fit()` additionally rejects NaN/inf/negative `first_treat`, drops units with `first_treat > 0` AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose `dose != 0` with a `UserWarning`. A panel that passes all five profile-side checks can still surface warnings, drop rows, or raise at fit time depending on the `first_treat` column the caller supplies. Reframed the wording in five surfaces from "full gate set" to "profile-side screening checks" with an explicit caveat that the checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()` applies separate `first_treat` validation: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells out the screening framing explicitly + lists the `first_treat` validations that fit() applies). - `diff_diff/profile.py` `_compute_treatment_dose` helper docstring (aligned with public contract: most fields descriptive, `dose_min > 0` is one of the screening checks). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote the multi-paragraph block to describe screening + first_treat caveat). - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature paragraph: screening checks + necessary-not-sufficient language + pointer to §2). - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (rewrote step 2 to call out screening + first_treat caveat; clarified counter-example #4 that `P(D=0) > 0` is required under BOTH `control_group="never_treated"` and `"not_yet_treated"`, not just default). - `CHANGELOG.md` Unreleased entry. - `ROADMAP.md` AI-Agent Track. P2 (test coverage for the missing `first_treat` caveat): Added a content-stability assertion in `tests/test_guides.py`: `assert "first_treat" in text` so the autonomous guide cannot silently drop the explicit `first_treat` validation caveat. P3 (helper / test-name inconsistency with public contract): Renamed `test_treatment_dose_does_not_gate_continuous_did` to `test_treatment_dose_descriptive_fields_supplement_existing_gates` and rewrote its docstring to match the now-honest public contract ("most fields descriptive distributional context that supplements the existing top-level screening checks"). The test body still asserts the same two things — `treatment_varies_within_unit` fires True on `0,0,d,d` paths and `has_never_treated` is independent of `has_zero_dose` — both of which remain accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…light checks as standard-workflow predictions, not estimator gates Reviewer correctly noted that calling {has_never_treated, treatment_varies_within_unit==False, is_balanced, no duplicate_unit_time_rows alert, dose_min > 0} the "screening checks" / "necessary" gates of `ContinuousDiD` overstates the contract. `ContinuousDiD.fit()` keys off the separate `first_treat` column (which `profile_panel` does not see), defines never-treated controls as `first_treat == 0` rows, force-zeroes nonzero `dose` on those rows with a `UserWarning`, and rejects negative dose only among treated units `first_treat > 0` (see `continuous_did.py:276-327` and `:348-360`). Two of the five checks (`has_never_treated`, `dose_min > 0`) are first_treat-dependent: agents who relabel positive- or negative-dose units as `first_treat == 0` trigger the force-zero coercion path with a `UserWarning` and may still fit panels that fail those preflights, with the methodology shifting. The other three (`treatment_varies_within_unit`, `is_balanced`, duplicate-row absence) are real fit-time gates that hold regardless of how `first_treat` is constructed. Reframed every wording site to call these "standard-workflow preflight checks" — predictive when the agent derives `first_treat` from the same dose column passed to `profile_panel`, but not the estimator's literal contract: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote the multi-paragraph block; explicit standard-workflow definition + per-check first_treat dependency map + force-zero coercion caveat). - `diff_diff/profile.py` `_compute_treatment_dose` helper docstring (already brief; stays consistent). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (long rewrite covering the standard-workflow framing + override paths). - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet + trailing paragraph (both updated; opening bullet now spells out which of the five checks are first_treat-dependent vs. hard fit-time stops; trailing paragraph promotes the standard- workflow framing). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step 2 (rewrote the gate-checking paragraph; counter-example #4 expanded to enumerate (a) supply matching first_treat and accept rejection, (b) deliberate relabel + coercion, (c) different estimator; counter-example #5 distinguishes negative-dose treated-unit rejection from never-treated coercion). - `CHANGELOG.md` Wave 2 entry (matches the new framing). - `ROADMAP.md` AI-Agent Track building block (matches). Test coverage: - Renamed assertion messages in `test_treatment_dose_descriptive_fields_supplement_existing_gates` and `test_treatment_dose_min_flags_negative_dose_continuous_panels` to remove "authoritative gate" phrasing; reframed as "standard- workflow preflight" assertions consistent with the corrected docs. - Added `test_negative_dose_on_never_treated_coerces_not_rejects` in `tests/test_continuous_did.py::TestEdgeCases` covering the reviewer's specific request: never-treated rows with NEGATIVE nonzero dose must coerce (with `UserWarning`) rather than raise the treated-unit negative-dose error. Sister to the existing `test_nonzero_dose_on_never_treated_warns` which covers the positive-dose case. Rebased onto origin/main during this round (no conflicts beyond prior CHANGELOG resolutions; main advanced 19 commits). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s-fallback wording; correct duplicate-row "fit-time stop" claim P1 (relabel-to-manufacture-controls misframing): Round 11 introduced wording across the guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings that presented intentional `first_treat == 0` relabeling of nonzero-dose units as an "option" / "fallback" for fitting `ContinuousDiD` when the profile-side preflights (`has_never_treated`, `dose_min > 0`) fail. REGISTRY does not document this as a routing option, and the estimator still requires actual `P(D=0) > 0` because Remark 3.1 lowest-dose-as-control is not yet implemented. The force-zero coercion at `continuous_did.py:311-327` is implementation behavior for INCONSISTENT inputs (e.g., user accidentally passes nonzero dose on a never-treated row), not a methodology fallback. Reworded every site to remove the relabeling-as-option framing and replace it with the registry-documented fixes when (1) or (5) fails: re-encode the treatment column to a non-negative scale that contains a true never-treated group, or route to a different estimator (`HeterogeneousAdoptionDiD` for graded-adoption panels; linear DiD with the treatment as a continuous covariate). Every remaining "manufacture controls" mention in the guide, profile, and tests is now an explicit anti-recommendation ("do not relabel ... to manufacture controls"). Updated: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (item (1): "not an opportunity to relabel ..."; item (5): coercion is "implementation behavior for inconsistent inputs, not a methodological fallback"). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (the When-(1)-or-(5)-fails paragraph names re-encode + alternative estimator only; explicit anti-relabel warning). - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet + trailing paragraph (consolidated; opening bullet drops the relabel-as-fallback framing; trailing paragraph trimmed to a pointer to §2). - `diff_diff/guides/llms-autonomous.txt` §5.2 step 2 + counter- example #4 + counter-example #5 (relabel-as-option language removed; explicit "do not relabel" callouts; counter-example #4 options trimmed to (a) re-encode and (b) different estimator). - `CHANGELOG.md` (relabel-as-option clause removed; replaced with re-encode / different-estimator framing). - `ROADMAP.md` (same). - `tests/test_profile_panel.py` two test docstrings (relabel-as- workflow language removed). P2 (duplicate-row "hard fit-time stop" misclaim): Round 11 wording said "duplicate-row failures are hard fit-time stops" — incorrect. `_precompute_structures` at `continuous_did.py:818-823` silently overwrites with last-row-wins, no exception raised. Reworded as "hard preflight veto: the agent must deduplicate before fit because `ContinuousDiD` otherwise uses last-row-wins, no fit-time exception" in profile.py docstring, guide §4.7 opening bullet, and §5.2 step 2 (now defers to §2 for the breakdown). The previously-correct §2 description of the silent-coerce path is preserved. Length housekeeping: The round-11 round-12 expansion pushed `llms-autonomous.txt` above `llms-full.txt`, breaking `test_full_is_largest`. Trimmed ~2.7KB by consolidating the §4.7 trailing paragraph + §5.2 step 2 trailing block to point at §2's full breakdown rather than duplicating the per-check semantics. autonomous: 65364 chars, full: 66058 chars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… first_treat from dose" framing; add PanelProfile backward-compat defaults; fix test_continuous_did docstring P1 (canonical ContinuousDiD setup vs. derive-from-dose framing): Round 12 introduced a "standard workflow" description across the guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings that said agents derive `first_treat` from the same dose column passed to `profile_panel`. Reviewer correctly noted this conflicts with the actual ContinuousDiD contract (`continuous_did.py:222-228`, `prep_dgp.py:970-993`, `docs/methodology/continuous-did.md:65-73`): the canonical setup uses a **time-invariant per-unit dose** `D_i` and a **separate `first_treat` column** the caller supplies — the dose column has no within-unit time variation in this setup, so it cannot encode timing. An agent following the rejected framing would either build a `0,0,d,d` path (which `fit()` rejects) or keep a valid constant-dose panel (in which case the dose column carries no timing information). Reworded every site to drop the derive-from-dose framing and replace with the canonical setup. The five facts on the dose column remain predictive of `fit()` outcomes BECAUSE the canonical convention ties `first_treat == 0` to `D_i == 0` and treated units carry their constant dose across all periods — so `has_never_treated` proxies `P(D=0) > 0` and `dose_min > 0` predicts the strictly- positive-treated-dose requirement, without any "derivation" of `first_treat` from the dose column. Updated: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote the multi-paragraph block to use the canonical-setup framing and added an explicit "agent must validate `first_treat` independently" note). - `diff_diff/guides/llms-autonomous.txt` §2 field reference. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet. - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step 2 + counter-examples #4 and #5 (now describe the canonical setup rather than a derive-from-dose workflow). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` `test_treatment_dose_min_flags _negative_dose_continuous_panels` docstring/comments. P2 (PanelProfile direct-construction backward compat): Wave 2 added `outcome_shape` and `treatment_dose` to PanelProfile without defaults, breaking direct `PanelProfile(...)` calls that predate Wave 2. Made both fields default to `None` (moved them to the end of the field list; both are `Optional[...]`). Added `test_panel_profile_direct_construction_without_wave2_fields` asserting that direct construction without the new fields succeeds and yields `None` defaults that serialize correctly through `to_dict()`. P3 (test_continuous_did.py docstring overstating sanction): The new `test_negative_dose_on_never_treated_coerces_not_rejects` docstring said the contract "lets agents legally relabel negative-dose units as `first_treat == 0` to coerce them away." Reworded as observed implementation behavior for inconsistent inputs, NOT a sanctioned routing option — the test locks in the coercion contract while the autonomous guide §5.2 explicitly tells agents not to use this path methodologically. Length invariant maintained: autonomous (65748 chars) < full (66031 chars); `test_full_is_largest` still passes (compares character count, not byte count, so on-disk size with UTF-8 multi-byte characters differs from the assertion target). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…fixes" overclaim for ContinuousDiD recoding P1 (overclaiming registry endorsement of recoding): Reviewer correctly noted the round-13/14 wording across the public-facing surfaces called re-encoding the treatment column a "registry-documented fix" / "documented option" / "documented fallback". REGISTRY only documents the `P(D=0) > 0` requirement and explicitly notes Remark 3.1's lowest-dose-as-control fallback is NOT implemented in this library. Re-encoding is an agent-side preprocessing choice that the registry neither endorses nor forbids — calling it "registry-documented" was an over-claim. Reworded twelve sites to drop the "documented" framing: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (items (1) and (5)). - `diff_diff/guides/llms-autonomous.txt` §2 field reference When-(1)-or-(5)-fails paragraph. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet trailing language. - `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph (consolidated to a pointer at §2; reduced redundancy). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain counter-example #4. - `tests/test_profile_panel.py` two test docstrings + one inline assertion message + one trailing comment. - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. The corrected framing across all surfaces: - Honestly state the contract: `ContinuousDiD` requires `P(D=0) > 0` and positive treated doses; Remark 3.1 not implemented. - When the contract isn't met, say `ContinuousDiD` "as currently implemented does not apply" — not "do this fix." - Mention routing alternatives that ARE in the library and DON'T require `P(D=0) > 0`: `HeterogeneousAdoptionDiD`, linear DiD with a continuous covariate. Those are routing facts, not methodology endorsements. - Re-encoding stays in the prose as an "agent-side preprocessing choice that changes the estimand and is not documented in REGISTRY as a supported fallback" — explicitly NOT endorsed. Length housekeeping: trimmed redundancy in the §4.7 trailing paragraph (consolidated to a pointer at §2) and tightened the §2 recoding paragraph; autonomous (65984 chars) < full (66031), `test_full_is_largest` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s "negative dose" branches; HAD only valid on the former Reviewer correctly noted that the round-15/16 wording listed `HeterogeneousAdoptionDiD` as a routing alternative whenever `ContinuousDiD` fails on the dose-related preflights, but HAD itself requires non-negative dose support and raises on negative post-period dose at `had.py:1450-1459` (paper Section 2). On a panel with `dose_min < 0`, routing to HAD silently steers an agent into the same fit-time error. Verified the rejection at `had.py:1450-1459`. Reworded every site to split the two failure modes: - Branch (a): `has_never_treated == False` (no zero-dose controls but all observed doses non-negative). `ContinuousDiD` does not apply (Remark 3.1 not implemented). HAD IS a routing alternative on this branch (HAD's contract requires non-negative dose, satisfied here); linear DiD with a continuous covariate is another. - Branch (e): `dose_min < 0` (negative treated doses). `ContinuousDiD` does not apply AND HAD is **not** a fallback either — HAD raises on negative post-period dose (`had.py:1450-1459`). Linear DiD with a signed continuous covariate is the applicable alternative on this branch. Updated wording across: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (refactored from item-by-item duplication into a numbered list with a single "Routing alternatives when (1) or (5) fails" section that splits the two branches; trimmed redundancy). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (split the When-(1)-or-(5)-fails paragraph into the two branches). - `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph (consolidated to a pointer at §2's split discussion). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain counter-example #4 (no never-treated branch: HAD applies) and counter-example #5 (negative-dose branch: HAD does NOT apply, cite `had.py:1450-1459`). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` two test docstrings/comments. Added `test_autonomous_negative_dose_path_does_not_route_to_had` in `tests/test_guides.py` asserting that §5.2 explicitly cites `had.py:1450-1459` on the negative-dose branch (used a single- line fingerprint since the prose phrase "non-negative dose support" is split across newlines in the rendered guide). Length housekeeping: trimmed counter-example #4 and #5 prose + §4.7 trailing paragraph to point at §2's split discussion; autonomous (65374 chars) < full (66031), `test_full_is_largest` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude added 2 commits January 2, 2026 12:16

Fix license format in pyproject.toml for PyPI publishing

c89d294

Update license field from deprecated table format to simple string format to comply with modern setuptools standards and eliminate deprecation warnings.

Revert license format to PyPI-compatible table format

042046d

Changed from SPDX string format back to {text = "MIT"} format for compatibility with current PyPI infrastructure which does not yet support the License-Expression metadata field.

igerber merged commit 00d26c2 into main Jan 2, 2026

igerber deleted the claude/setup-pip-install-k2M4j branch January 3, 2026 12:52

igerber mentioned this pull request Apr 19, 2026

Close BR/DR gap #4: canonical-dataset regression guards + wording fixes #341

Merged

igerber added a commit that referenced this pull request Apr 20, 2026

Merge pull request #341 from igerber/br-dr-canonical-validation

752f2b6

Close BR/DR gap #4: canonical-dataset regression guards + wording fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/setup pip install k2 m4j#4

Claude/setup pip install k2 m4j#4
igerber merged 2 commits intomainfrom
claude/setup-pip-install-k2M4j

igerber commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

igerber commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants